Skip to content

Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3)#22

Merged
lukekim merged 1 commit into
trunkfrom
worktree-robust-beaming-pizza
May 14, 2026
Merged

Coalesced pipelined SMB I/O for higher 10G throughput (v0.5.3)#22
lukekim merged 1 commit into
trunkfrom
worktree-robust-beaming-pizza

Conversation

@lukekim
Copy link
Copy Markdown
Contributor

@lukekim lukekim commented May 14, 2026

Summary

  • Coalesced pipelined SMB I/O: pipelined_write and pipelined_read now build the entire batch into one BytesMut, sign each packet in-place, and emit a single write_all per batch — eliminates 64 per-packet to_vec allocations and collapses 64 write_all syscalls into 1. Encode-side CPU drops from 154 µs to 49 µs at the typical d64×64 KiB working point (3.1× faster), on top of the syscall reduction.
  • Zero-copy read decode: new decode_read_response_from_msg takes the owned response Vec and returns a Bytes slice over it — eliminates the per-response body.to_vec(), saving ~4 MiB of memcpy per 64-deep batch.
  • GetObject streaming channel sized to the SMB pipeline depth: was 4, now READ_PIPELINE_DEPTH. A full pipeline batch dumps into the channel without blocking, so back-to-back SMB read batches overlap with HTTP draining instead of serializing per-chunk.
  • Bench script upgrades (scripts/bench-live.sh): adds concurrent multi-stream PUT/GET (BENCH_CONCURRENCY, default 8) — the test that actually exercises a 10G pipe — and an optional raw mount_smbfs baseline (BENCH_MOUNT_BASELINE=1) to quantify the spiceio translation overhead against the link ceiling.
  • New micro-benches: pipelined_write_encode_coalesced and pipelined_read_decode_zerocopy track the optimized paths.
  • Bumps version to v0.5.3.

Test plan

  • make lint — fmt, clippy (strict), rustdoc warnings all clean
  • cargo test --locked — 145/145 unit tests pass (3 new tests cover the zero-copy decoder, including overflow rejection)
  • cargo bench --bench protocol_bench -- pipelined — confirms the 3.1× encode speedup at d64×64 KiB and ~2.1× at d64×1 MiB; no regression on existing benches
  • On a 10G-attached NAS: BENCH_CONCURRENCY=16 BENCH_MOUNT_BASELINE=1 ./scripts/bench-live.sh — verify the concurrent PUT/GET aggregate approaches the mount_smbfs ceiling and the single-stream numbers are no worse than before
  • CI sccache + extended + stress integration tests pass against the runner NAS

Reworks the SMB pipelined-read and pipelined-write paths to build all
batch packets into one contiguous BytesMut, sign each in-place, and
emit a single write_all per batch — eliminating 64 per-packet to_vec
allocations and collapsing 64 write_all syscalls per batch into 1.

Adds a zero-copy read response decoder that slices an owned Vec into
Bytes without the prior body.to_vec() — saves ~4 MiB of memcpy per
64-deep batch at 64 KiB chunks.

Sizes the GetObject streaming channel to READ_PIPELINE_DEPTH so a
full pipeline batch can dump into the channel without blocking,
letting back-to-back SMB batches overlap HTTP draining.

Extends bench-live.sh with concurrent multi-stream PUT/GET
(BENCH_CONCURRENCY) and an optional raw mount_smbfs baseline
(BENCH_MOUNT_BASELINE) to quantify the spiceio translation overhead
against the link ceiling. Adds matching protocol micro-benches.

Microbench (pipelined_write_encode, d64 x 64 KiB): 154 us -> 49 us,
~3.1x faster on the CPU side, on top of the 64 -> 1 syscall reduction.
Copilot AI review requested due to automatic review settings May 14, 2026 08:25
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the SMB read/write pipelining hot paths to reduce per-packet allocations, memcpys, and syscalls, and updates the S3 GetObject streaming path and benchmarking tooling to better target 10G-throughput scenarios.

Changes:

  • Coalesce pipelined SMB read/write request batches into a single BytesMut and sign packets in-place before a single write_all.
  • Add a zero-copy read-response decoder that slices payload bytes directly from the owned SMB2 message buffer.
  • Size GetObject’s streaming channel to the SMB pipeline depth and enhance live/criterion benchmarks; bump version to 0.5.3.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/smb/protocol.rs Adds decode_read_response_from_msg and unit tests for zero-copy read payload extraction.
src/smb/ops.rs Exposes READ_PIPELINE_DEPTH for cross-layer coordination (SMB ↔ HTTP streaming).
src/smb/client.rs Implements coalesced pipelined read/write encoding + in-place signing; uses new zero-copy decoder in pipelined reads.
src/s3/router.rs Sizes GetObject streaming channel to SMB pipeline depth to improve overlap between SMB reads and HTTP writes.
scripts/bench-live.sh Adds concurrent PUT/GET benchmarks and optional mount_smbfs baseline mode.
benches/protocol_bench.rs Adds micro-benches for coalesced pipelined write encoding and zero-copy pipelined read decode.
Cargo.toml Version bump to 0.5.3.
Cargo.lock Lockfile version bump to 0.5.3.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/smb/protocol.rs
Comment thread src/s3/router.rs
Comment thread scripts/bench-live.sh
Comment thread scripts/bench-live.sh
@lukekim lukekim self-assigned this May 14, 2026
@lukekim lukekim added the enhancement New feature or request label May 14, 2026
@lukekim lukekim enabled auto-merge (squash) May 14, 2026 08:34
@lukekim lukekim merged commit 5736b5f into trunk May 14, 2026
8 checks passed
@lukekim lukekim deleted the worktree-robust-beaming-pizza branch May 14, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants